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Abstract 



Low-latency communication in large-scale multiprocessors requires high- 
performance interconnection schemes. Multistage interconnection networks 
with redundant paths combine high performance with fault-tolerance, but 
exact evaluation of the blocking probability of interconnection networks with 
redundant paths is expensive. Equations for the blocking probability and 
throughput of multistage, multipath interconnection networks are derived. 
A method of approximate solution of the equations is presented, with a 
derivation of error bounds on the estimated solution. A program that solves 
the equations exactly and approximately is presented. 
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Chapter 1 

Introduction 

1.1 Background 

The realization of low-latency communication in large-scale multiprocessors 
requires high-performance interconnection schemes. Both direct and indirect 
networks are examples of these; here our focus is on self-routing, multistage 
networks, both with unique paths and with multiple (redundant) paths. 

One popular measure of the performance of a multistage interconnection 
network is its bandwidth or throughput - that is, the expected number of mes- 
sages it delivers in each cycle, where the inputs have some given probability 
of generating a message. A related measure, from which the bandwidth 
may be calculated in some models, is the probability of successful message 
transmission or the normalized throughput - the probability that an arbi- 
trary message at an input is not blocked (and presumably queued for later 
service) by some other request in the course of delivery. The problem of 
calculating the probability of successful message transmission is more often 
referred to in the telephone switching literature by its complement - the 
blocking probability, and we shall do the same here. 

The problem of computing blocking probabilities in regular variants of 
unique-path multistage interconnection networks has been extensively stud- 
ied. These networks were called Banyan networks by Goke and Lipovsky 
[9]. Patel [21] and Kruskal and Snir [15] in particular presented expressions 
for the probability of successful message transmission of delta networks, 
which are a particular regular variant of Banyan networks. Multiproces- 
sors have been built using such regular Banyan networks for interconnection 
[22, 24]. A later chapter presents a method for calculating the exact block- 
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ing probabilities of unbuffered Banyan networks that applies not only for 
delta networks, but in the general case of any unique-path network. The 
method applies where sources generate messages with different probabilities, 
as well as where different destinations have different probabilities of having 
messages addressed to them. 

However, precisely because Banyan networks are unique-path networks, 
they are not inherently fault-tolerant. The failure of a switching element 
will necessarily cut off communication between at least one message source 
and one message sink in the network. A scheme that allows replacement of 
failed components with idle spares must be used to maintain connectivity. 
This is the approach used in, e.g., the extra-stage cube network [1], or in 
the dynamic redundancy network [13], both of which emulate a (Banyan) 
indirect cube network and provide fault-tolerance by reconfiguring in the 
presence of faults. 

An alternative to the maintenance of idle spares is to make active use 
of the "spares" to increase bandwidth, by building a multipath network. 
Here we mean that, in the course of normal (fault-free) communication, the 
redundant paths are used in routing packets to their destinations. Some 
examples of these are the augmented delta network [8], the multibutterfly 
network [26], and the merged delta network [23]. 

Both fault-tolerance and performance can be enhanced with the addition 
of multiple paths. Unfortunately, multipath networks create problems for 
the traffic theorist. In a Banyan network, if one assumes messages at the 
inputs are generated by independent processes, the presence or absence of 
messages at the inputs of any switch in the network is independent of the 
presence or absence of messages at the other inputs of that switch. 1 Thus 
the analysis of blocking probabilities in Banyan networks is simplified, and 
polynomial-time algorithms exist for calculating the exact blocking proba- 
bility [14]. When multiple paths are allowed, independence is violated. 

The author has found in the literature no polynomial-time algorithm 
that calculates the exact blocking probability of a multipath network, nor 
any proof that the problem is NP- or #P-hard. The method described in a 
later chapter, for synchronous, packet-switched multipath networks, requires 
the solution of a number of equations that is exponential in the number 
of communications channels entering a stage in the network. A program 
that automatically solves these equations exactly, given a description of 
the network, is presented in what follows; but it cannot be used on large 
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networks, as the running time grows too quickly. 

Thus an approximation method must be used to estimate the block- 
ing probability of larger networks. The exact solution remains useful, not 
only because it is used in the approximation method, but because it allows 
some evaluation of approximation methods through comparison with exact 
solutions for small problems. We consider two approximation methods. 

The first is direct simulation of the network, where sample input loads 
are selected and offered to the simulation, the fraction of messages blocked 
in each is calculated, and a blocking probability is estimated. The second, 
which is more satisfactory because it achieves the same error bounds in less 
running time than does direct simulation, is approximation of the solution 
to the equations, by a Monte Carlo method that we shall describe. This is 
similar to the approach taken by Harvey and Hills in [11]. Harvey and Hills 
were considering circuit-switched telephone networks with unique paths; but 
their approach, which was to find approximate solutions of exact equations, 
rather than exact solutions to approximate equations, can still be of use 
here. 

1.2 Prior Work 

The earliest work in analysis of the performance of interconnection networks 
was driven by the need to efficiently switch telephone traffic. Some of the 
earlier work on interconnection networks and their performance, by Clos [7] 
and Benes [3], concentrated on the design of non-blocking networks, networks 
for which a connection that constitutes a bijective mapping from sources to 
destinations can always be accomplished without blocking. 

Non-isochronous applications such as shared-memory references in a mul- 
tiprocessor can better tolerate blocking, and thus often use blocking variants 
of Banyan networks, as presented by Goke and Lipovski [9]. 

Patel [21] presented a probabilistic analysis of the blocking probability 
of delta networks, a subset of the more general class of Banyan networks. 
His work assumed that all sources transmit with uniform probability, and 
that all destinations are selected with uniform probability. Bhuyan [5] has 
extended Patel's work to include analysis of the case where each processor 
has a single favorite destination that is not the favorite destination of any 
other processor. Kruskal and Snir [15] have extended Patel's work by finding 
an asymptotic expression for the blocking probability in networks with large 
numbers of stages. 
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Analyses that model the buffering that must be used due to blocking 
in Banyan networks have also been developed; two recent examples include 
the work of Merchant [18] and Lin and Kleinrock [17]. These models can- 
not be used for multipath networks, however, due to the above-mentioned 
correlation of channel loads in a multipath network. 

The literature on performance of multistage multipath networks is more 
sparse. Specific topologies are usually simulated, as in [2], [8], [23], and [16]. 
A similar problem has been studied in the context of telephone switching 
systems [12]. However, in telephone switching systems the model is one of 
a circuit-switched network where the holding time for circuits varies. Fur- 
thermore, in the methods described in [12], it is assumed that the networks 
modeled are symmetric; because there are classes of asymmetric networks 
that are of interest, 2 and because we are partly interested in calculating 
blocking probabilities in the presence of (asymmetric) faults, these methods 
are not satisfactory. 

1.3 Motivation 

Our goal in this work is to provide a tool that can be used by multiprocessor 
architects to easily compare the performance of competing multistage inter- 
connection network structures. A secondary goal is to provide an analysis 
that highlights some of the aspects of interconnection network structure that 
have particular bearing on performance. 

Almost all Banyan networks used in multiprocessors to date have been 
delta or omega networks, and the performance of these has been studied 
extensively. Our contribution is in providing a method of calculating the 
throughputs of Banyan networks of arbitrary interconnection structure and 
with unusual switching components. The method allows easy modeling of 
cases with general destination distributions and general source transmission 
probabilities. 

Multipath, multistage network performance has been less widely stud- 
ied. The correlation of channel loads can have significant effects on the 
performance of these networks. The methods we develop calculate the joint 
probability mass function of groups of channels between stages of the net- 
work to allow calculation of multipath network performance parameters. We 
hope that the multipath network designer who wishes to examine the results 
of a design decision will be able to achieve some insight from our model. 
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Random interwiring of multipath networks (as described in [16]) for 
fault-tolerance yields a large space of possible network structures; one might 
generate a number of these, insert faults randomly and select the one with 
the best performance. In [2], Chong et al. describe the use of simulation to 
evaluate different circuit-switched multipath networks, including randomly- 
and deterministically-interwired networks. The method we develop allows a 
quick measure of performance in fewer steps than does direct simulation of 
the network. 

1.4 Approach 

We use a simple model of offered traffic in our calculation of blocking prob- 
abilities for both Banyan and multipath networks. In our model, although 
different inputs in the network can have varying probabilities of transmis- 
sion, we assume that the messages presented at the inputs to the network 
are produced by independent memoryless processes. 

This model is known to be optimistic. The throughput calculated in an 
analysis using this model will be higher than the throughput calculated in 
simulations that include buffering, or in more detailed analyses that model 
buffering. We can understand one reason for the optimism of the model 
by considering that it cannot account for multiple conflicting requests pre- 
sented to the network. In the case of, say, a three-way collision between 
requests competing for the same resource in one cycle, only one request can 
be serviced, and there will necessarily be a collision again at the next cycle 
between the remaining requests. 

Patel has noted the optimism of this memoryless model and comments 
that in his simulations that took buffering into account, the probability 
of successful message transmission varied only slightly from that predicted 
by the memoryless model [21]. Nussbaum et al. examined the analogous 
assumption for circuit-switched interconnects and reported that the error in 
the memoryless model was at all loads less than 10%, and suggested that 
for most purposes the memoryless model should probably be preferred for 
its simplicity [20]. 

Bhandarkar examined in particular the probability that a memory el- 
ement in a distributed-memory multiprocessor system would be busy, and 
compared his analysis, which did model buffering of blocked requests, with 
a memoryless model [4]. His conclusion was that where the ratio of the 
number of memories to the number of processors was greater than 0.75, 
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the expected number of busy memories in the memoryless model is always 
within 6 to 8% of that in the buffering model. 

The results reported by Chang et al. were similar: in examining the 
throughput of multiprocessor memories, they found that a memoryless model 
was always 6 to 8% more optimistic than the results they generated with an 
analysis that modeled queueing of memory requests [6]. 

Given that our primary goal is to provide multiprocessor architects with 
a means of comparing the performance of alternative network structures, 
we deemed the known optimism of the memoryless model to be worth the 
simplicity it affords, especially in view of the complexity of the problem of 
deriving blocking probabilities in multipath networks. 

1.5 Outline 

In the remainder of this document, we further define the problem of calculat- 
ing the throughput and blocking probability of a multistage interconnection 
network and present methods for solving it. 

In Chapter 2 we define the problem and our model specifically enough to 
allow the description of a method for analyzing the performance of Banyan 
networks. Chapter 3 presents that method, as well as a program that cal- 
culates the performance parameters numerically or symbolically. 

Chapter 4 further defines our model to include multipath networks and 
presents equations for exactly analyzing the performance of multipath net- 
works. Chapter 5 presents means of approximating the performance pa- 
rameters of multipath networks. Finally, we have included a listing of our 
procedures for Banyan network performance evaluation in an appendix, as 
these were compact enough to make such presentation practical. 



Chapter 2 

Problem Statement 



2.1 The Model 

An indirect network is one in which the network switching elements are 
segregated from the inputs and outputs of the network, as in Figure 2.1. 
The message sources inject messages into the network at the inputs, which 
in Figure 2.1 are depicted on the left side and are labeled Iq through I7. 
Messages are routed through the network and arrive at the message sinks on 
the right side, labeled Oq through Or- In a multiprocessor, the network input 
channels might connect to 8 processing elements, and the output channels 
might connect to the same 8 processing elements. 

The particular class of indirect networks that we model is the class of 
multistage, unbuffered, synchronous, packet-switched networks. Such a net- 
work might look like the one depicted in Figure 2.2. This network has 
multiple stages: if we consider the stage consisting of all the sources to be 
stage 0, then stage 1 consists of the column of switching elements connected 
directly to the sources; stage 2 the column of switches to the right of stage 
1, etc. 

The networks we consider are self-routing: each message contains the 
information necessary to route the message from the source where it is in- 
jected to the sink that is its destination. No global information is used in 
the routing process, so that the probability mass function of the loads on 
the output channels of a switch can be calculated from the probabilities of 
the loads on the input channels. As a simple example, in the indirect cube 
of Figure 2.2, 2x2 switching elements route on individual bits of the desti- 
nation address, starting with the low-order bit. There are log 2 8 = 3 address 
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Indirect 
Network 



J n-l 



Figure 2.1: An indirect network is one in which switching elements are 
segregated from the inputs and outputs of the network. Messages enter the 
network through the input channels on the left side, and are routed to the 
output channels on the right side. 



Stage 



Stage 1 




Figure 2.2: An 8x8 Banyan network. This is a unique-path network called 
an indirect cube; multipath networks will be treated in Chapter 4. 
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bits; the switching elements at stage route on the 2°'s bit; those at stage 
1 on the 2 x 's bit, and those at stage 2 on the 2 2 's bit. A cleared address bit 
indicates that the message should be routed through the upper channel; a 
set address bit indicates that it should be routed though the lower channel. 
Thus a message addressed to destination 6 = IIO2 leaves stage through 
an upper channel, stage 1 through a lower channel, and stage 2 through a 
lower channel. 

Blocking occurs in the 2x2 crossbar when two messages arriving at 
the inputs are both to be routed through the same output channel. Both 
requests cannot be serviced, and so one of the messages is routed through 
the output channel, and the other is blocked. In our model, both messages 
have equal likelihood of being routed through the output channel, and many 
switching elements behave this way; but one might easily modify the analyses 
we present to change this assumption. 

We also consider networks in which the channels between stages can carry 
more than one message. Kruskal and Snir referred to such networks as dilated 
networks [15], and we follow their lead here; furthermore we call switching 
elements in which the output ports can pass more than one message dilated 
switching elements. We refer to each of the dilated output ports as a logical 
direction. If a switch has N input ports, each of which can receive a single 
message, and M output ports, each of which can send up to K messages 
simultaneously, we call it an N X M, dilation K switch. 

We calculate the throughput of the network under the following assump- 
tions: 

• The processes generating messages at the sources are independent and 
memoryless. With some specified probability p 8 -, each source i gen- 
erates or fails to generate a single message at the beginning of each 
cycle. Each generated message is directed to a stage 1 switch. 

• The network is synchronous: at each cycle messages move from stage 
i to stage i + 1. 

• The network is treated as unbuffered (as described in Section 1.4): if a 
message is blocked at some stage, it is considered to be lost, and does 
not in any way affect the future states of the system. 

• If Ao, A\, . . . are random variables representing the addresses of mes- 
sages generated in some particular cycle by message sources 0, 1, . . ., 
then the A{ are independent and identically distributed; the distribu- 
tion can be specified as a parameter of the model. 
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We define our model further in Chapter 4, extending it as necessary for 
multipath networks. 

2.2 The Problem 

We are interested in deriving the bandwidth, or throughput, of a multistage 
interconnection network - that is, the expected number of messages it de- 
livers in a cycle. We calculate this number by finding the probability mass 
functions 1 of the loads on channels leading to sinks. 

Suppose that the network has M sources. Call the probability that the 
ith source generates a message in a given cycle P 8 -. If we say that B is 
the bandwidth and Ps is the probability of successful message transmission, 
then we may calculate Ps as the ratio of B to the expectation of the input 
message loading. Ps will vary with the input loading, because of internal 
blocking in the network. B, too, will vary because of internal blocking and 
also directly with the number of messages entering the network. Thus we 
can better express Ps and B as functions of the P 8 -, giving us the relation: 

Ps(P ,P 1 ,...,P M -i)= BiPo '^ p PM - l) (2-1) 

Thus our problem is finding the probability mass functions of the loads on 
channels leading to sinks. These probability mass functions can also be used 
to specify other information besides mean throughput; if the network is not 
symmetric, or if a non-uniform destination address distribution for injected 
messages is specified, or if different sources are specified to have different 
probabilities of message generation, then the loads on individual channels 
leading to sinks will be of interest in find the effects of the asymmetries on 
traffic to particular destinations. 

The quantities B and Ps will typically vary smoothly with the source 
transmission probabilities P 8 -. Let us consider a simple case. If the destina- 
tion distribution is uniform, all sources i have equal probability of generating 
messages, and we vary P 8 - between and 1 for the network of Figure 2.2, 
the resulting graphs for the bandwidth and probability of successful message 
transmission are as shown in Figures 2.3 and 2.4, respectively. 

The probability of successful message transmission is close to 1 when 
there are very few messages injected into the network, because there is little 



Joint probability mass functions, in the case of multipath networks. 
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Figure 2.3: Bandwidth (labeled E[A]) plotted versus message generation 
probability (labeled PO) for the network of Figure 2.2. Here the destination 
address distribution is uniform. 
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Figure 2.4: Probability of successful message transmission (labeled 
P{Success}) plotted versus message generation probability (labeled PO) for 
the network of Figure 2.2, with a uniform destination address distribution. 
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blocking in a nearly empty network. Ps decreases as the number of mes- 
sages injected into the network increases. The bandwidth or throughput 
starts at 0, when no messages are being injected into the network, and, be- 
cause of blocking, increases less than linearly as the probability of message 
transmission increases. 



Chapter 3 

Performance of Banyan 
Networks 1 

3.1 Introduction 

In this chapter we present a method of calculating the throughput of a 
Banyan network. As described in Section 1.2, Patel [21] and Kruskal and 
Snir [15] have presented solutions to this problem for regular variants made 
up of crossbar switching devices, but we present a method that works for 
Banyan networks of arbitrary interconnection structure and allows modeling 
of some unusual switching devices. 

Consider first the probability mass function of the message load on a 
single channel in a Banyan network. The channel may either be carrying a 
message, in which case its message load is one, or it may be idle, in which 
case its message load is zero. Let the random variable / denote the message 
load. The two values that / can take on partition the space of possible 
loading configurations for the network into two disjoint subsets. / is then 
a Bernoulli random variable, and we use the notation pi(lo) to denote the 
value of its probability mass function at Iq. 2 We denote the value of the 
Z-transform of /'s probability mass function at z with the notation pf(z). 

Our approach will be to define three operations on the probability mass 
functions of channel loads. These are called bundling, switching, and con- 
centration. They are represented graphically as depicted in Figure 3.1. We 



The work described in this chapter was performed jointly with Dr. Thomas F. Knight, 
Jr. and has been described in [14]. 

In later sections we will also use the notation P{! = lo}, when this is convenient. 
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(a) 



j->k 



(b) 



k 
-h 




(c) 



Figure 3.1: (a) The symbol for bundling two input bundles into one. (b) 
The symbol for concentrating j channels to k channels, (c) The symbol for 
switching with probability q to the top output channel, and 1 — q to the 
bottom output channel. 

compose these operations to model switching elements, and further accord- 
ing to the interconnection structure of the network. The result is an oper- 
ation that transforms the probability mass functions of the loads on input 
channels to the probability mass function of the load on an output channel. 



3.2 Loads on Banyan Network Channels at a Sin- 
gle Stage are Independent 

We require a simple proof to proceed. We will be forming the sum of the 
loads on distinct channels in a single stage in a Banyan network, and thus 
we need to understand how, if at all, the random variables we are summing 
are correlated. It turns out that these loads are in fact independent. A proof 
for the special case of delta networks is presented in [21]; here we present a 
different proof for the general case. 

The proof is straightforward. Note first that, if messages are generated 
at source nodes by mutually independent random processes, and the sets 
of messages on distinct channels entering a switching node originate at dis- 
joint sets of source nodes, then the loads on those channels are necessarily 
independent. 

We now claim that the sets of messages on distinct channels entering 
any switching node in a Banyan network satisfy this criterion: i.e., they 
originate at disjoint sets of sources. 
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For, consider: if channel A and channel B are two channels entering a 
switching node, and a message on channel A and a message on channel B 
originate at a single source, then it must be the case that at least two paths 
exist from that source to any sinks accessible from the switching node: one 
path that uses channel A and one that uses channel B. But this is impossible 
in a Banyan network, as Banyan networks are in fact those in which there 
is exactly one path from each source to each sink. 

Thus the sets of messages on distinct channels entering any switching 
node in a Banyan network must originate at disjoint sets of sources, and so 
the loads on the channels entering any switching node in a Banyan network 
must be mutually independent, as was to be proved. 

3.3 Bundling 

We call the operation of summing the loads on a group of channels bundling. 
We will call such a group of channels a bundle. 

Because channel loads are independent, if we are summing loads a and 
6, then we form the convolution of their probability mass functions. We use 
the notation 

B[p a (a ),p b (b )] = p a (a ) * p h {b ) (3.1) 

Of course, this operation can be performed on bundles, as well as on 
single channels. The result of bundling two bundles composed respectively 
of n and m single channels is a bundle whose load can take on values rang- 
ing from through n + m. We depict in Figure 3.2 one possible loading 
probability mass function of a bundle composed of 8 single channels. 

In the Z-domain, bundling becomes multiplication of the Z-transforms 
of the loading probability mass functions in question: 

Z[B[ Pa (a ),p b (b )]]=p T a (z).pJ(z) (3.2) 

3.4 Concentration 

Suppose in an N X M, dilation K switch 3 more than K arriving messages 
are to be routed in a particular logical direction. Some of the messages are 



See Section 2.1 for an explanation of dilated switches. 
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Figure 3.2: Loading probability mass function for an eight-channel bundle, 
where each channel carries a message with probability 1/2. 

then blocked and must be dropped. We call the operation that corresponds 
to this situation concentration. 

More specifically, suppose we have a bundle of N single channels, whose 
load we call a, and we wish to direct messages from it into a bundle of K 
single channels, whose load we call b. Of course if N < K, Pb(lo) = Pa(h) f° r 
all loads /q, because in this case none of the messages on the input bundle 
will ever be blocked. If JV > K , we calculate the probability mass function 
of b as follows: 

• Because the output bundle carries fewer than K messages exactly when 
the input bundle carries fewer than K messages, we have that Pb(lo) = 
Pa(lo) where l < K. 

• The output bundle will carry K messages whenever the input bundle 
carries at least K messages; thus we have that Pb{K) = J2i = K p a (io)- 

• Because the output bundle cannot carry more than K messages, Pb(lo) = 
for l Q > K. 

Intuitively, then, we can think of the effect of concentration on the input 
loading probability mass function as truncating it at K , by setting prob- 
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Figure 3.3: 6-concentration of the loading probability mass function of Fig- 
ure 3.2. 

abilities for loads greater than K to and adding to the probability for 
K the probabilities for all greater input loads. In Figure 3.3 we show the 
result of concentrating to 6 channels the loading probability mass function 
of Figure 3.2. 

If S(n) denotes the value of the unit impulse function at ra, we can express 
the loading probability mass function of an iV-channel bundle as an impulse 
train with value k{ at i: 

N 

8=0 

If u (n) the value of the unit step function at n, we can express concen- 
tration of this iV-channel bundle to a ii'-channel bundle as follows: 



Cn,k \pi(lo)] = Pi(l ) u(K -/„)+( E M'i) I *(^o - K) (3.3) 

\h=K+l ) 

In the Z-domain, we have 
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N I N \ 

Z[C n , k \pi(1 )]]=pT(z)- J2 Pi(h)z h + [ £ Pi(h))z K 

h=K+l \h=K+l J 

Combining the two summations, we get 

N 

Z[C N , K \pi(lo)]]=pT(z)+ J2 Pi(h)(z K -z h ) (3.4) 

h=K+l 

3.5 Switching 

We call the elementary operation of directing the messages on a bundle to 
two other bundles of the same width as the input bundle switching. Here we 
do not mean to use the term in precisely the sense that it is used when we 
speak of, e.g., a 2 X 2 switch. In the elementary operation we call switching, 
no blocking is modeled; no messages can be lost. What we are modeling 
instead is the direction of messages to separate ports in routing. 

We specify the probability that the messages on the input load are 
switched in the direction of the output load. This probability is calculated 
in accordance with the destination address distribution (as will be described 
in Section 3.6); but as an example, for the 2x2 crossbars in the network of 
Figure 2.2 under a uniform destination address distribution the probability 
specified for the switching operation will be 1/2. 

Thus the switching operation is performed on an input loading proba- 
bility mass function and a switching probability, and its result is an output 
loading probability mass function. Call the load on the input bundle a, and 
that on the output bundle 6, and say that the input bundle (and perforce 
the output bundle) is composed of N single channels. 

We form Pb(bo) by conditioning on the number of messages on the input 
bundle: 

N 
Pb(bo) = Yl Pb\a ( b o I a )p a (a ) 

a =0 

To evaluate the conditional probability, let q be the probability that 
an input message is switched to the output bundle. By independence of 
message destinations, each message is switched independently, and thus the 
number of messages switched to the output bundle is binomially distributed, 
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because it is the number of successes in a® independent Bernoulli trials with 
probability q of success: 

Pb \a(bo\a )= h)q b °(l-qp- b ° 

Substituting, we have 

S [ Pa (a ) , S] = Pb (bo) = V Pa(ao) I 7 I q b °(l ~ qf ^ (3.5) 




In the Z-domain, we take an analogous approach. Note that the number 
of messages routed to the output channel is the sum of a random number 
of identically distributed random variables. The number of summands is 
the number of messages on the input load. The summands themselves are 
Bernoulli random variables that are 1 when the message in question is routed 
to the output bundle and when it is not. 

If we use the random variable c to denote one of the summands, its 
probability mass function is given by 



with Z-transform 
Thus we have 



Pc(c ) = (1 - q)f>(c ) + q6(c - 1) 
Pl( z ) = l~q + qz 



Z[S\p a (ao),q]] = p T a (p T c {zj) 

= p T a (l-q + qz) (3.6) 

Of course, if we cascade K switching operations whose probabilities are 
qi, q2, ■ ■ ■ , qxi the effect on the probability that an individual message is 
routed to the output bundle is the same as if we performed one switching 
operation with q = ]X=i Qi- 

The (predictable) effect of switching upon a loading probability mass 
function is to decrease the mean. The effect on the distribution of Figure 3.2 
of switching with q = 1/2 is shown in Figure 3.4. 
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Figure 3.4: The effect of switching the loading probabiiity mass function of 
figure 3.2 with probability 1/2. 



3.6 Deriving Switching Probabilities from Mes- 
sage Destination Distributions 

The technique we use for deriving switching probabilities from message des- 
tination distributions has also been used by Lin and Kleinrock in [17]. 

As described in Section 2.1, the addresses of distinct messages injected 
into the network are independent and identically distributed. Suppose that 
the message sinks are numbered 0, 1, . . . , N — 1, and consider a switch X 
for which the set of accessible message sinks is S. Suppose that X has M 
output ports. By uniqueness of paths in a Banyan network, the ports must 
have disjoint sets S\, S2, . . . , Sm °f accessible destinations, and because the 
destinations accessible through the output ports are all the destinations, we 
must have that U«=i &i = S. An example is depicted in Figure 3.5. 

We wish to know the probability that an arbitrary message arriving 
at switch X is directed in direction i. Suppose that some message W with 
destination given by the random variable D is injected into the network. The 
value we are looking for is the conditional probability that W is addressed 
to a destination in the set Si, given that it has arrived at switch X. We 
have 
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Figure 3.5: The destination set of the switch X is {1,3,5,7}. The destina- 
tion set of the upper channel is {1, 5}; that of the lower channel is {3, 7}. 



p{D e Si\D e S} 



P{QP e Sj) njDe S)} 

p{d e S} 
p{d e Sj} 
p{d e S} 

E seSt P{D = s} 



J2 seS P{D = s} 
where the last expression follows from mutual exclusivity of destinations. 



3.7) 



3.7 Example: the 2 k x 2 k Crossbar 

As an example of both the symbolic and numeric use of our method, we 
derive a well-known expression for the throughput of the 2 k X 2 k crossbar. 
We form a schematic representation of the crossbar with a combination 
of our operators. First we construct a bundle of 2 k channels by bundling the 
single- channel inputs k times. Then we switch the messages on the bundle 
k times, to form 2 k bundles, each of which can hold 2 k messages. Finally we 
concentrate these 2 -wide bundles to single channels, thereby modeling the 
blocking that takes place in the crossbar. Figure 3.6 shows the result for an 
8x8 crossbar. 
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Figure 3.6: Schematic representation of an 8 X 8 crossbar network. Here 
we show the switching probabilities set for a uniform destination address 
distribution. 
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For brevity's sake, in our analysis we assume that all input channels have 
a single probability Q of transmitting, and that the destination address dis- 
tribution is uniform. It will be evident that the derivation would otherwise 
proceed in the same fashion, but would be more lengthy. 

Suppose that the input channels have probability Q of transmitting dur- 
ing a cycle. If we call the load on an input channel y, the loading probability 
mass function for an input channel will then be 

Py(yo) = Q6(y - 1) + (1 - Q) S(y ) 

with Z-transform 

p T y (z) = Qz + (l-Q) 

Bundling k times, we get for the transform of the probability mass function 
of the load x c on the bundle entering the switches 



pIS z ) = (Py( z ))' 



Let x s be the load on a channel after the stages of switching, but before 
concentration. We switch k times with probability 1/2 at each stage, the 
result being the same as switching once with probability l/2 fc : 



pIO) 




To make the expression clearer, we substitute M = 2 , rearrange, and 
invert the Z-transform: 

fix*) = m [y:r:i(^-i]H 



■'So, 




M\ (M 
l)\~Q~ 


-l 


M\ (M 
I )\Q" 


-l 



y% so -(M-o) 
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We can save ourselves some work in performing the concentration from 
2 (that is, M) channels to one channel by making use of the following 
device. We note that, because we are concentrating to a single channel, the 
only possible loads for the channel are and 1. We recall from Section 3.4 
that concentration will retain the probability for a load of 0, as is less 
than the maximum load on the channel. The probability for a load of 1 will 
necessarily be the complement of that for 0. First we take the probability 
that x s = 0: 



We simplify the expression by noting that the terms where / ^ M will all 
be 0: 

(Q\ M (M - M 

p - (0) = KM) [-Q- 1 

M, 

If we call the load on an output channel /, the loading probability mass 
function for an output channel is then given by 

Piilo) = (l " ^) M Kh) + (l - (l - f) M ) S(l " 1) (3-8) 

The expected load on an output channel is then 

E[/] = 

There are M output channels, so the expected load on all of them, or the 
throughput of the crossbar, is 




MEU] = M . . 

V V MJ 

The expected load on an input channel was Q, so that the total expected 
input load is MQ. We can now use Equation (2.1) to derive the probability 
of successful message transmission in M X M crossbar: 

Ps Q 
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Figure 3.7: The probability of successful message transmission (labeled 
P{Success}) as a function of the probability that a source is transmitting 
(labeled Qi), in an eight-by-eight crossbar network with a uniform destina- 
tion distribution. 

We plot the probability of successful message transmission against the source 
transmission probability in Figure 3.7. 

3.8 Automatic Calculation of Numerical Values 
for Performance Parameters 



We present in Appendix A a package of Mathematica procedures that imple- 
ment the elementary operations we have described. Of course the operations 
are easily implemented in other languages, but it is advantageous to use a 
symbolic algebra package if one wishes to derive symbolic expressions for 
performance. 

We can use this package to implement procedures that operate on source 
loading probability mass functions and return the loading probability mass 
functions for channels leading to sinks. As an example, we turn again to the 
network of Figure 2.2. The tree whose root is one of the sinks and whose 
leaves are the sources is depicted in Figure 3.8. 

Assuming for clarity's sake that the destination address distribution is 
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Stage 



Stage 1 
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Figure 3.8: The tree of channels leading to a sink in the network of Fig- 
ure 2.2. 

uniform, we might use our package to model the 2x2 crossbar as follows: 

crossbar2x2[PMFl_, PMF2_] := 

concentrate [switch [bundle [PMF1, PMF2] , 

1/2], 
1] 



If we also assume (again, in the interests of brevity; it will be clear that 
the calculation in the general case is no more complex) that all sources 
transmit with equal probability, we can take advantage of the symmetry of 
the network to calculate the loading probability mass function of a channel 
leading to a sink as follows: 

threeStageDelta[q_] := 

Block [{inputPMF, stagelPMF, stage2PMF>, 
inputPMF := [(1-q), q] ; 

stagelPMF := crossbar2x2 [inputPMF, inputPMF]; 
stage2PMF := crossbar2x2 [stagelPMF, stagelPMF]; 
crossbar2x2 [stage2PMF , stage2PMF] 
] 
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Here the probability that a source is transmitting is specified as the input 
parameter; three levels of switching are performed; the result of the last is 
returned. 

We may calculate the resulting bandwidth and probability of successful 
message transmission from Equation (2.1), as is done in Section 3.7. The 
results are plotted in Figures 2.3 and 2.4, on page 11. 

3.9 Modeling an Unusual Switching Component 

We use an example to illustrate the modeling of an unusual switching com- 
ponent - an 8 X 4, dilation 2 switch. 4 Such switches are more usually used 
in multipath networks, as we shall see in Chapter 4, but Banyan networks 
with replicated links are not unknown, and Kruskal and Snir have analyzed 
regular variants in [15]. 

3.9.1 An Application for an 8 x 4, dilation 2 Switch 

We can use standard 4x4 crossbars to build a 16 X 16 indirect binary 
cube network, as depicted in Figure 3.9. The methods of analysis of the 
performance of this network follow directly those of Sections 3.7 and 3.8. 

As an alternative, we might choose to use a different sort of switching 
element in the first stage, to improve performance. This switching element 
- an 8 X 4, dilation 2 switch - has eight input channels, but switches mes- 
sages in only four logical directions, with two output ports for each of these 
logical directions. If only one message is switched in a particular direction, 
the output port is picked randomly. If two messages are switched in the 
direction, both ports are used; if more than two messages are switched in 
the direction, the excess messages are blocked. 

In Figure 3.10, we show how we might modify the first stage of the 16 X 16 
indirect cube network to make use of the dilated switching component. The 
second stage must still use 4x4 crossbars, to select the particular output 
channel to which the message is directed. 

Although it might appear that we have constructed a multipath network 
here, in fact we have not. The numbers appearing next to output ports on 
the dilated components in Figure 3.10 are logical direction numbers, and it 
is to be noted that both outputs for a particular logical direction lead to 



See Section 2.1 for an explanation of this terminology. 
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Figure 3.9: A 16 X 16 indirect binary cube network built from standard 4x4 
crossbars. 
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Figure 3.10: A 16 X 16 indirect binary cube network with the first stage 
built from 8x4, dilation 2 switches, and the second stage from standard 
4x4 crossbars. 
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Figure 3.11: Schematic representation of an eight-by-four, dilation two 
switching component. The switching probabilities are for a uniform des- 
tination address distribution. 

the same second-stage switch; the model will reflect this. Thus we have four 
two-channel bundles leading from each first-stage component. 



3.9.2 Deriving Expressions for the Performance of the 8x4, 
Dilation 2 Switch 

A schematic model of the 8x4, dilation 2 switch is shown in Figure 3.11. 
Note that, in our model, the only difference between this component and 
the 8x8 crossbar of Figure 3.6 is that there are only two stages of switching, 
and the final concentration is to two channels, rather than to one. Here we 
gain an intuition from our model: we noted that concentration was where 
blocking occurred. Because there is less concentration in the new network, 
there will be less blocking. 

The derivation follows that of the 2 X 2 crossbar in Section 3.7. Again, 
for brevity's sake we assume a uniform destination distribution. Call the 
load on an input channel y. Assuming all inputs transmit with probability 
Q, the loading probability mass function for an input channel is 

Py(yo) = Q6(y - 1) + (1 - Q) S(y ) 
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with Z-transform 

p T y (z) = Qz + (l-Q) 

Call the load on the bundle entering the switches x c . The transform of the 
probability mass function of x c is then 

pIS z ) = {Py( z )) 

We switch twice with probability 1/2 each time, the result being the same 
as switching once with probability 1/4. If x s is the load on a channel after 
the two stages of switching, we have 

£<*) = (^( i T 1 )) 8 

sC)H)'(r-' 



Now we invert the transform: 

8/8 



Px.( x *o) = [ — 



JC)(H v °- (8 -<») 



We concentrate to two channels here, so that it still saves us some work 
to use the technique we did for the crossbar, but it will be a little more 
complicated to do so. 

We take the probability that x s = first. The sum will be zero whenever 
/ 7^ 8, giving us: 

4. 
For x s = 1, the sum will be zero whenever / ^ 7, giving us: 



CHAPTER 3. PERFORMANCE OF BANYAN NETWORKS 32 

Call the load on a two-channel output bundle /. We know that p/(0) = 
Px s (Q) an d pi(l) = p Xs (l). The only other case for a two-channel bundle is 
/ = 2, so the probability for / = 2 must be the complement of the other two 
cases, so we have for the probability mass function of /: 

8 / n\ 7 



Pi(lo) = (l - I ) S(l ) + 2Q(l- |) 6(1 - 1 



+ H-?)>?)K« 

By our assumptions of uniformity, all four output bundles have the same 
loading probability mass function, and so the throughput of the switch is 
E[4/]: 

The expected load on an input channel was Q, so that the total expected 
input load is 8Q. As for the crossbar, we use Equation (2.1) to derive the 
probability of successful message transmission: 



Ps 



Q(i-^) 7 + i-(i-^) 7 (i + ^ 



Q 

i+(0-(i + f))(i- 


-f) 7 


Q 

i-(i + ^)(i-^) 7 



Q 

The probability of successful message transmission is plotted against the 
source transmission probability in Figure 3.12. 

3.9.3 Performance of the 8x4, Dilation 2 Switch 

We can use our package of Mathematica procedures to write a procedure for 
the 8x4, dilation 2 switch, as follows: 

eightXfourD2[q_] := 

Block [{bundled, switched, instages, outstages}, 
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Figure 3.12: Probability of successful message transmission plotted against 
source transmission probability for an 8 X 4, dilation 2 switch, under a 
uniform destination address distribution. 

instages = 3; 

outstages = 2; 

bundled = {(1-q), q}; 

Do [bundled = bundle [bundled, bundled], {instages}]; 

switched = bundled; 

Do[switched = switch [switched, .5], {outstages}]; 

concentrate [switched, 2]] 



We will need a four- by- eight, input dilation two crossbar for the second 
stage: 

crossbar2x4in2 [stageTwoPMF_] := 
Block [{bundled, switched}, 

bundled = stageTwoPMF; 

bundled = bundle [bundled, bundled]; 

switched = bundled; 

Do[switched = switch [switched, .5], {2}]; 

concentrate [switched, 1]] 
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Figure 3.13: The bandwidth of the 16 X 16 indirect cube made from 4x4 
crossbars, as depicted in Figure 3.9, is shown dashed. The bandwidth of the 
variant with a first stage made from 8x4, dilation 2 switches, as depicted 
in Figure 3.10, is shown in solid black. Both are plotted against the source 
transmission probability, for a uniform destination address distribution. 



Now we can specify a procedure that yields as output the probability 
mass function of the load on a channel leading to a sink, given the probability 
that a source is transmitting: 

eightXf ourD2indirectl6[q_] := 
Block [{f irstStageOut} , 

(* input of first stage is just q *) 

(* returns LPMF for 2-wide channel *) 

f irstStageOut = eightXf ourD2[q] ; 

(* now feed to 4x4 crossbars and return result *) 

crossbar2x4in2 [f irstStageOut] ] 

We plot in Figure 3.13 the bandwidth for the 16 X 16 indirect cube made 
from 4x4 crossbars, and that for the variant with a first stage made from 
8x4, dilation 2 switches. It will be seen that, as predicted, the performance 
of the network built with the dilated part is better. 



Chapter 4 

Analyzing the Performance 
of Multipath Networks 

4.1 Introduction 

In the previous chapter we have presented a method of analysis of Banyan 
network performance. But as we discussed in the introduction, Banyan 
networks, while amenable to analysis, are not intrinsically fault-tolerant. 

We present in this chapter a method of analysis of multipath networks. 
The performance parameters, and the model, are much the same as for 
Banyan networks; but the requirement of unique paths and thus indepen- 
dence of channel loads is removed. 

We leave behind the scheme of using elementary operations to build de- 
scriptions of switching elements, and instead directly derive the joint loading 
probability mass function for a set of channels leading from a switch. 

We also present a program that solves these equations exactly. As was 
mentioned in the introduction, the program cannot be used for large net- 
works, as its running time grows too quickly. We have found in the literature 
no polynomial-time program that computes the exact blocking probability 
of a multipath network. It may be that the problem is intractable, although 
we know of no proof of NP- or #P-completeness for it. 

Chapter 5 describes an approximation method for estimating solutions 
to the equations, by making use of the exact solution for subproblems. 
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Figure 4.1: An 8x8 deterministically-interwired network with redundant 
paths. There are a number of different paths from any source to any sink, 
to increase fault-tolerance; redundant paths from message source 4 to sink 
3 are highlighted. Routing is oblivious, with stochastic concentration. This 
wiring scheme is from [2]. 

4.2 Extensions to the Model 



Figure 4.1 depicts a multipath network. We extend our model so that sources 
can have more than one channel to the network. A source still generates at 
most one message per cycle, which is directed to a stage 1 switch via one of 
the channels connecting the source to the network. The particular channel 
is selected randomly and with uniform probability. 

As before, the processes generating messages at the sources are inde- 
pendent and memoryless. With some specified probability p 8 -, each source 
i generates or fails to generate a single message at the beginning of each 
cycle. The network is synchronous: at each cycle messages move from stage 
i to stage i + 1. It is also unbuffered: if a message is blocked at some stage, 
it is considered to be lost, and does not in any way affect the future states 
of the system. 

We use dilated switches, as described in Section 2.1, so that the set 
of output channels of a switching element is divided into nonempty disjoint 
subsets called logical directions. At each cycle, the switching element directs 
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each incoming message in one logical direction. As for Banyan networks, 
we can choose the switching probabilities to model any single destination 
address distribution. When we route messages in a logical direction, we use 
stochastic concentration: 

• If there are fewer messages or exactly the same number of messages 
directed in the logical direction as there are channels in that logical 
direction, then the channels that will carry the messages are chosen 
randomly, with uniform probability. 

• If there are more messages directed in a logical direction than there 
are channels in that direction, the messages that can be carried are 
chosen with uniform probability, and the other messages are blocked 
and lost. 

We note again that our network is self-routing: each message contains 
the information necessary to route the message from the source where it is 
injected to the sink that is its destination. No global information is used. In 
particular, this means that if we have several switches at a single stage, then 
given the loads on their input channels, the loads on the output channels of 
each switch are independent of the loads on the output channels of the other 
switches. This fact will be important in allowing us to factor joint loading 
probability mass functions. 

Having extended our model, let us return to the network of Figure 4.1. 
The switches here are 4x2, dilation 2 switches, except at the last stage, 
where they are simply 2x2 (dilation 1) switches. In the 4x2, dilation 2 
switches, the top two output channels constitute one logical direction, and 
the bottom two constitute another. 

As with Banyan networks, we wish to find the bandwidth and the proba- 
bility of successful message transmission of the networks we model. We find 
these parameters by finding the probability mass functions of the loads on 
channels leading to sinks. 

4.3 The Joint Probability Mass Function of an 
Aggregate of Channels 

Suppose that the input channels of a switch S , depicted in Figure 4.2, are 
connected to several switches i?i, i?2, • • • , Ri- Let us use the random variable 
L to denote the entire output loading configuration of S at some specified 
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Figure 4.2: Interstage wiring. Note that no subset of the channels depicted 
need have mutually independent loads in a network with redundant paths. 
The output channels on the right of the switch marked S are those whose 
loads are referred to collectively in the text with the random variable L. 



discrete time t, so that P{L = 1} is the probability that the output channels 
of the switch have some particular loads designated in their aggregate by / 
during cycle t. 

Now consider the loads on the input channels Cn, . . . , C{ w at cycle t—1. 
(Because we assume a synchronous, unbuffered network with memoryless 
processes generating the messages at the inputs, only the cycle before cycle 
t is of interest.) Let us denote the loads on the input channels at cycle t—1 
with the random variables ip,, , . . . , Ln ■ 

In order to find the joint probability mass function of the loads on the 
output channels of S, we condition on the loads on the input channels: 



P{L = l} 






I Lc-1 1 = *Ci 1 1 • • • 1 Ln- - 
= I Cii 1 • • • 1 Ln- = In- \ 



h- } 



(4.1) 



where the sum is over all tuples /c n , . . . , lc iw with elements in {0, 1}. 

Suppose that we can compute P{L = I \ L Cll = lc n , • • • , L Clw = h^}- 1 
In order to compute the probability of an output loading configuration of S 
we will still need to find the joint probability mass function of the channel 



J An expression for this conditional probability is derived in Section 4.4. 
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Figure 4.3: Channels referred to in Equation (4.2). Although the probabil- 
ities of the message loads on the channels Cn, . . .,Ci w are not in general 
independent, the loads on the subset of channels from each switching element 
are independent given the message loads on the input channels -Bn, . . . , Bn. 



loads -Leu ' • • • ' Lci w - I n a Banyan network, it would be easy to compute this 
function; it would simply be the product of the probability mass functions of 
the loads on the individual channels, as channel loads in a Banyan network 
are independent. 2 In a network with redundant paths, however, the loads 
on these channels are not in general independent, as they may derive from 
the same sources, and a message from a single source that has traveled one 
path in the network cannot be traveling along another path. Thus another 
method must be used. 

In Figure 4.3, we see that the input channels Cn, . . .,Ci w of switch S 
are the output channels of switches R\, . . .,Ri. Let us call the loads on 
the input channels to these switches i/B n , • • .,Ls it - We may now calcu- 
late P{ic* n = leu, ■ ■ ■ iLc iw = lc\ w } by conditioning on the values of the 
variables Lb x1 , ■ ■ ■ , Ls it - We have 

p {-kcii = Jcu , • • • , L Clw = lc iw } = 

Yl F i L Cn = l c xl , • • • , L Clw = h iw I L Bll = Ib x1 , • • • , L Bit = lB it } ■ 



l Bll ,-,lB tt 



P{-£<Bii = hn , • • • , L Bit = I B it } 



(4.2) 



"See Section 3.2. 
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where the sum is over all tuples /# n , . . . , ls it with elements in {0, 1}. 

The loads on the output channels of these switches are not in general 
mutually independent. However, let us partition them into subsets according 
to the switch at which they originate, so that for the channels shown in 
Figure 4.3 we would have the subsets 

{Cli, . . .C\u} , {C*21, • • -C2v} , • • •, {Cn, . . . , Ci w } 

Note that, under the assumption of independence of message destinations, 
and given the loads on the channels Bn, . . .,Ba, the loads on the switch 
output channel subsets are mutually independent. As mentioned in Sec- 
tion 4.2, this is a consequence of the fact that the networks we model are 
self-routing. No global information is used in routing messages through the 
network. 

That is, if we know the input loads for the switches i?i, . . . , Rj, then the 
loading probabilities for the output channels of each of the switches do not 
depend on the output loads of any other switch. We may use this fact to 
derive the joint probability mass function of the loads on the output channels 
Cii, . . . , C'i w by conditioning on the input channel loads. We have then 

p {-kcii = Jcu , • • • , L Clw = lc iw } = 

J2 PJ-kcu = l c 11 ,---,L Clu =i Ciu | L Bll = l Bll ,...,L Blr = Ib 1t \ ■ 



1 b 11 ,---,Ib Vi 



?{ L c 21 - lc 21 ,---,Lc 2v =i c , 2v I Lb 21 - Ib 21 ,---,Lb 2s - Ib 2s \ 



?{Lc a 


= hn , • • 


• i Lr; 


= In- Lr-, 


= l B A , • 


• • > L B it z 


= h it }- 


?{L Bll 


= ^Bii,- 


• • > L B, t 


= hj 






(4.3 



where the sum is once again over all tuples /_e n , . . . , Ib t with elements in 
{0,1}. 

The subexpression P{Lb x1 = ^_B 115 • • •, Lb %1 = ^B lt } can be evaluated re- 
cursively by means of Equation (4.3), until the channels Bn, . . . , Ba corre- 
spond to sources. If these channels originate at message sources, then we 
substitute instead the probability mass functions corresponding to sources. 
We may simply take the product of these functions for the sources in ques- 
tion, as in our model the processes generating messages at the sources are 
mutually independent. 

If source i, depicted in Figure 4.4, generates a message with probability 
Pi and has k channels into the network, then we have for the loads on the 
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Figure 4.4: The source E generates a single message at each cycle with 
probability p 8 -. The message is transmitted with uniform probability over a 
randomly picked channel in the set {C\, . . . , C'k}- 

channels C\ , . . . , C'k the joint probability mass function 

if all the /, 

if exactly one lc is 1, 



P{L C \ = l Cl ,---,L Ck = k' k } 



1 — pi if all the Iq are 

and the rest are 
otherwise 



It remains now to evaluate the conditional probabilities in Equation (4.3). 
Recall that these are the joint conditional probability that some subset of 
the output channels of a dilated switch have a particular load, given that 
the input channels have a particular load. We derive an equation for these 
conditional probabilities in the next section. 

4.4 Joint Probability Mass Functions of Dilated 
Switch Output Channels 

Suppose we have anillxiV, dilation K switch. We may form the conditional 
probability mass function of the loads on the output channels, given the 
input load, by conditioning. Say that the random variable Lf t9 represents 
the load on the g th channel in the / th logical direction. 
We wish to evaluate the expression 

P{-^i,i = ^1,1, • • • , £iv,fc = In,u I Lc\ = /d , • • • , Lc M = lc M ] 
For an event E, define 

Q{E} = P{E\L Cl =l Cl ,...,L CM = lc M } (4.5) 
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Of course Q{E}, like P{E \ Lc\ = lc\, ■ ■ ■ , Lc M = lc M }i is a probability in 
the usual sense; the definition is used to make completely clear the signifi- 
cance of the further conditioning we perform below. We will condition on 
the number of messages directed in each logical direction. If the random 
variable C'i represents the number of messages routed in logical direction i, 
we have: 



Q{Li,i = h,i, ■ ■ ■ , Ln,u = lN,k} = 








/ y Qi-C-1,1 = ^1,1, • • • , -^JV,fc = lN,k D\ 


= h,. 


-,D N = 


= d n } ■ 


Q{_D X = d u ...,D N = d n } 






(4.6) 



where the sum is over all iV-tuples d\ , . . . , d^ such that each d{ > and 

Now we consider the switching probability. We calculate the probabil- 
ities for the N logical directions using Equation (3.7) of Section 3.6 (of 
course, under uniform addressing each of these probabilities would be 1/N). 
Suppose that these probabilities are qi, q^t ■ ■ •? QN- By our assumption of 
independence of message addresses, the probability that of the J2i=i ^c t ar- 
riving messages, d\ are directed in direction 1, di in direction 2, and so on, 
is simply multinomial, so that 



K d\, . . .,<ijv< 



Q{Z>! = d 1 ,...,D N = d N }= [p =1 ^Jq^q? ■ ■ ■ qff (4.7) 



Now let us evaluate Q{£i,i = /i,i, . . ., L^,k = ^N,k \ D\ = d\, . . . , Djv = d n }. 
Say that 6 8 - is the number of messages output in direction i; that is, 6 8 - = 



J2g=i U,g- This number is not the same as di, because if there are more than 
K messages to be output in a ii'-wide direction, some messages are dropped 
and lost. If 6 8 - messages are output, then under stochastic concentration the 
channels are picked with uniform probability, and so the probability of any 
single configuration will be l/(^). Thus 

Qi-t-1,1 = ^1,1, • • • , £jv,fc = In,u I Di = di, . . . , Dn = djv} = 
if for any i, b{ ^ min (di, K) 

n—rr- otherwise 
. ( K ) 

8 = 1 \bJ 



where b % = Y^=i l 



ha- 
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Combining Equations (4.5), (4.6), (4.7), and (4.8), we have 
P{-^i,i = ^1,1, • • • , £jv,fc = lN,k | Lc\ = lc\ , ■ ■ ■ , Lc M = lc M } = 

where b{ = J2« = i h,g, an d the sum is over the iV-tuples d\, . . . , <ijv such that 
for each d t , min (d t , K) = b t , and J2fLi ( k = J2fii k\- 

Of course, if the conditional joint probability of the load on a subset 
of the switch's output channels is desired, as opposed to all of the switch's 
channels, we can simply sum this expression over all the possible loads on 
the complement of the subset of channels whose loads are required, as the 
different configurations of the output channels are mutually exclusive events. 

4.5 Automatic Calculation of Blocking Probabil- 
ities 

It will be clear that the automatic calculation of blocking probabilities by 
this means will require a great deal of time. Suppose we have a computer 
program that calculates the blocking probabilities for a network in the most 
obvious way, by finding the joint probability mass function of the channels 
leaving the final stage, using Equation (4.3) recursively. In the worst case, we 
can imagine a network where there are N stages and M dependent channels 
between each of the N stages, and the joint probability mass function of all 
of the channels between each of the stages must be formed. The domain 
of the joint probability mass function for each stage then is of size 2 M , 
each value being calculated as a sum over 2 terms. Assuming the time 
to calculate each of the terms summed over in Equation (4.3) is 0(M), we 
have then 0(NM2 2M ) for the worst-case performance. 

The performance on some networks can be better than this, however. 
Suppose that we need to calculate P{Lc\ = lc\, ■ ■ ■, Lc„ = lc„}- Let S (c) 
denote the set of source nodes from which messages can reach channel c. 
If we can partition the set of channels {C\,...,C n } into disjoint subsets 
5i, ... , S m such that for any C\ G Si and C\ G Sj, i ^ j, S (C\) P\ S (C2) is 
empty, then the loads on the channels in each subset Si are independent of 
the loads on the channels in any and all of the other subsets in the partition. 3 



3 As can be seen from the argument in Section 3.2. 
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Figure 4.5: The network of Figure 4.1, with switches labeled. 



Then the expression P{Lc\ = lc\,- ■ -,Lc n = lc„} can be factored into the 
product of m joint probability mass functions, one for each subset Si. In the 
limiting case of a Banyan network, a complete factoring will be possible for 
every set of channels, and the summation itself can be factored, so that the 
worst case performance for a Banyan network of N stages with M channels 
between the stages becomes 0(N M). 

A Common LISP program has been written to evaluate the joint prob- 
ability mass function of the loads on specified channels in a multistage in- 
terconnection network. The program is given a symbolic description of the 
interconnection network; this requires labeling the switching nodes of the 
network. We show a labeling of the network of Figure 4.1 nodes in Fig- 
ure 4.5. The input description for this network is shown in Figure 4.5. 

The program uses the network representation to build an internal struc- 
ture in which (for example) information about independence of channel loads 
has been pre-computed, and channels have been assigned names generated 
from the names of the their nodes of origin and destination. One can then 
query the program for the probability mass function of interest. The result 
is numerical, as in the example below: 

> (setq d8x8 (parse-multistage-network 

deterministically-interwired-8x8-rep)) 
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(def parameter deterministically-interwired-8x8-rep 

; ; inputs first — these don't get a conditional probability 

; ; function. 

'((iO (a b) nil 1/2) (il (a b) nil 1/2) 

(i2 (a b) nil 1/2) (i3 (a b) nil 1/2) 

(i4 (c d) nil 1/2) (i5 (c d) nil 1/2) 

(i6 (c d) nil 1/2) (i7 (c d) nil 1/2) 

; ; stage 1 4x4' s 

(a (e f g h) 4x2d2-cp-fun) (b (e f g h) 4x2d2-cp-fun) 

(c (e f g h) 4x2d2-cp-fun) (d (e f g h) 4x2d2-cp-fun) 

; ; stage 2 4x4' s 

(e (ttO ttl tt2 tt3) 4x2d2-cp-fun) 

(f (ttO ttl tt2 tt3) 4x2d2-cp-fun) 

(g (tt4 tt5 tt6 tt7) 4x2d2-cp-fun) 

(h (tt4 tt5 tt6 tt7) 4x2d2-cp-fun) 

; ; stage 3 2x2 ' s 

(ttO (oO ol) 2x2dl-cp-fun) (ttl (oO ol) 2x2dl-cp-fun) 

(tt2 (o2 o3) 2x2dl-cp-fun) (tt3 (o2 o3) 2x2dl-cp-fun) 

(tt4 (o4 o5) 2x2dl-cp-fun) (tt5 (o4 o5) 2x2dl-cp-fun) 

(tt6 (06 o7) 2x2dl-cp-fun) (tt7 (06 o7) 2x2dl-cp-fun) 

• ; outputs 
oO) (ol) (o2) (o3) (o4) (o5) (06) (o7))) 



(o 



Figure 4.6: Symbolic description of the network of Figure 4.5. The descrip- 
tion specifies that during each cycle each source node generates a message 
with probability 1/2. 
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#<MULTISTAGE-IETWORK 8x8> 
> (jlpmf ' (tt6-o7-0 tt7-o7-0) d8x8) 
(#S(JLPMF-PART CHAIIELS (#<CHAIIEL TT6-07-0> 

#<CHAIIEL TT7-07-0>) 
IUMBER-OF-CHAIIELS 2 
VECTOR #(10321939817/17179869184 
2931771091/17179869184 
2931771091/17179869184 
994387185/17179869184) ) ) 

Here we have calculated the joint probability mass function of the loads 
on two channels leading from two 2x2 switches to sink 01 in the network 
of Figure 4.1, given a probability of transmission in each message source 
of 1/2, and under a uniform destination address distribution. The vector 
component of the structure result above is indexed by integers in which the 
bit with weight 2 8 specifies the load of the ith channel (starting with i = 0) 
in the vector of channels whose joint loading probability mass function was 
required. Thus in the example above, the probability that no messages are 
transmitted to sink 01 is 10321939817/17179869184 x 0.601; the probabil- 
ity that 1 message is transmitted along the channel from switch TT1 to 01 
is 2931771091/17179869184 x 0.171, as is the probability that 1 message is 
transmitted along the channel to 01 from switch TT6. Finally, the probabil- 
ity that both channels carry a message is 994387185/17179869184 x 0.058; 
we assume here, as in [2], that a message sink can receive two messages 
during a single cycle. 4 

To find the blocking probability of the network, we use Equation (2.1); 
we form the probability of successful message transmission as the ratio of the 
expected number of messages entering the network to the expected number 
of messages arriving at sinks. Because of the symmetry of the network, all 
the channels leading to sinks have identical loading probabilities, and so 
we can simply sum the expectations of their loads. We have then that the 
expected number of messages arriving at a single sink is 

2931771091 2931771091 994387185 981539569 

1 + 1 + 2 = x 0.457 

17179869184 17179869184 17179869184 2147483648 



4 If a sink can receive only one message during a cycle, then the expected number of 
messages received by a sink during a cycle will be 

10321939817 6857929367 

1 = « 0.40 

17179869184 17179869184 
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Figure 4.7: The probability of successful message transmission (P{Success}) 
plotted against the the source transmission probability (Pi) for the network 
of Figure 4.1, under a uniform destination address distribution. 



and the expected number of messages arriving at all sinks during any cycle 

981539569 981539569 
8 • — — = — — — — - « 3.66 



2147483648 268435456 
Because the expected number of messages entering the network is 8 • | = 4, 
we have that the aggregate probability of successful message transmission 
in this network at a loading factor of 1/2 is 



E[messages arriving at sinks] 981539569 

E[messages injected by sources] 1073741824 



0.914 



and thus the blocking probability is approximately 0.086. 

We plot for the network of Figure 4.1 the probability of successful mes- 
sage transmission versus the probability that a source transmits in Fig- 
ure 4.7. 

The Common LISP implementation internally records joint probability 
mass functions so that they need not be recomputed. The implementation 
has been coded with some attention to performance, because, although the 
asymptotic performance is pessimal, the same code is used on subnetworks 
of larger networks in an approximation scheme. 
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Figure 4.8: A 16 X 16 network with random interwiring in the first and 
second stages. The figure is from [2]. 



4.6 Applicability of Exact Calculation of Block- 
ing Probabilities 

We have presented a means of exact calculation of the blocking probability 
of a multistage network with redundant paths, and demonstrated its use in 
a program that automatically calculates blocking probabilities and exploits 
independence of channel loading probabilities where this is possible. 

The implementation described cannot be used to calculate the blocking 
probabilities of networks with much more path redundancy than the one of 
Figure 4.1. We might consider an implementation that could exploit the 
symmetry exhibited by some multistage networks, but such an implemen- 
tation could still not be used on a network like that in Figure 4.8, in which 
the wiring in the first and second stages is not symmetric and is in fact 
randomly generated. That such networks are of interest is demonstrated in 

[16]- 

Thus we must seek approximate solutions to the problem. This we do 
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in the next chapter, where we will see that the exact equations and our 
algorithm for solving them can be used to realize a faster approximation 
method. 



Chapter 5 

Approximating Performance 
Parameters for Multipath 
Networks 

5.1 Introduction 

We saw in the previous chapter that exact calculation of the probability 
mass functions of channels leading to sinks in a multipath network could 
be very expensive. In this chapter we seek a method of approximation of 
performance parameters that will allow us to estimate to within a given 
error the loading probability of a channel leading to a sink. We will do 
this by using Monte Carlo methods, attempting both direct simulation of 
the network and also approximation of Equation (4.3), and compare the 
expense and error of the two methods. 

Our approximations use exactly the model we described in Sections 2.1 
and 4.2. We will find that this exact correspondence is important as we 
develop a method of approximating solutions to the equations by a combi- 
nation of simulation and exact methods. 

5.2 Direct Simulation 

In direct simulation, we simulate the transition through the network of a 
group of messages generated in a single cycle as follows: 



50 
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1. Messages are generated for the cycle being simulated by sources in 
accordance with the source transmission probabilities p 8 -. 

2. Addresses are picked according to the destination address distribution. 

3. The messages arrive at switching elements and are directed in logical 
directions in accordance with their addresses. 

• The direction of more messages in a logical direction than there 
are channels in the direction is resolved by randomly choosing 
messages are blocked. 

• Output channels within a logical direction are selected randomly, 
with each channel having the same probability of being selected 
to carry a message. 

4. Step 3 is repeated until we have calculated the loads of the channels 
whose states we are examining in the simulation. 

Note that, using the results of Section 3.6, we can generate the same 
distribution of messages as we do in step 2 by modifying step 3 to randomly 
pick, for each message, a logical direction in accordance with the switching 
probabilities of the switch. Our simulation algorithm then becomes: 

1. Messages are generated for the cycle being simulated by sources in 
accordance with the source transmission probabilities p 8 -. 

2. The messages arrive at switching elements and are directed in logical 
directions in accordance with the switching probabilities of the switch. 

• The direction of more messages in a logical direction than there 
are channels in the direction is resolved by truncating the number 
of messages to the dilation of the logical direction. 

• Output channels within a logical direction are selected randomly, 
with each channel having the same probability of being selected 
to carry a message. 

3. Step 2 is repeated until we have calculated the loads of the channels 
whose states we are examining in the simulation. 

Simulation procedures for the random selections described above are 
straightforward. We describe them briefly here; details of these techniques 
can be found in introductory texts on probability models (e.g., [25]). 
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Message generation is performed by simulating the generation of a Bernoulli 
random variable with the source transmission probability. Selection of logi- 
cal directions for a message can be performed by subdividing the half-open 
interval [0, 1) into as many segments as there are logical directions, the length 
of the segment for a logical direction being the same as the probability of 
selecting that direction. A uniform random variable U is generated and the 
segment into which U falls is taken as corresponding to the selected logical 
direction. Finally, the random selection of output channels within a logical 
direction can be performed in many ways; we do so by considering the k 
channels to correspond to bits in a k-bit vector. If there are n messages to 
be directed in the logical direction, we set only the low n bits in the vector 
and then randomly permute the vector, which can be done in 0(k) steps. 1 
The bits that are set after the permutation correspond to the channels that 
carry messages. 

5.3 Approximation of Performance Parameters 
Using Direct Simulation 

Repeated simulations can be used to approximate the parameter of interest 
by the Monte Carlo method. Suppose that what we are interested in is the 
probability that some set of channels C has a particular loading configuration 
/. We run some number N of simulations, examining after each simulation 
the loads on the channels C . If the channels have the loading configuration 
/, the experiment is considered a "hit" and has value 1. If the channels do 
not have the loading configuration /, the experiment is a "miss" with value 
0. The mean of the values of the experiment is taken as an approximation 
of the expected load. 

Now we describe direct simulation using standard notation, as the tex- 
tual description above would prove too unwieldy later in the chapter. 2 

Let ri, 7*2, . . .,Tfc denote all the random variates that might be required 
to perform a single direct simulation by the algorithm described above. 3 
Then let R = (7*1, 7*2, . . . , r^) be a vector of these random variates. Now let 
Ri, R2, • • • , R n be a sequence of such vectors, identically and independently 
distributed. 



Using an algorithm on pp. 474-476 of [25]. 
2 We use the notation of [10]. 
That the number of random variates that might be required is finite will be clear when 
we consider that only a finite number of outcomes from each experiment is possible. 
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If / (R) is a function whose value is 1 where the channels C whose states 
we are examining in simulation have the load /, and where they do not, 
then the variables 

/,- = /(R-i) 

are identically and independently distributed. If E[/(R)] = ji, then 

n 

/ = £/.■ 

8 = 1 

is an unbiased estimator of P{Lc = /} = //. 

In order to calculate error bounds on our approximations, we will need 
to know the variance of /. Because the /■ are Bernoulli, 

Var(/) =I / x(l- / x) 

because /i(l — /i) is the variance of /(R). Unfortunately, this expression 
will not be very useful in practice, as we do not a priori know /i, or there 
would be no need to estimate it. Thus we estimate the variance of /(R), 
using the unbiased estimator 

1 n / \2 



T E (/«-/) «Var(/(R)) 



8=1 

There are means of estimating the variance of s 2 , but we will not use 
these, as in practice the variance is small, and our error bounds are conser- 
vative. 

Given the estimate s 2 for Var(/(R)), we may estimate the variance of 
/as 

Var (f) 



s 2 



n 

yielding a standard error of s/y/n, which shows clearly that the error will 
vary as the inverse root of the number of trials. 

5.4 Bounding the Number of Iterations 

To bound the number of iterations for which our simulation must run in 
order to achieve a specified level of precision at a specified confidence, we 
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can use the Chebyshev Inequality, which states that if X is a random variable 
with mean fi and variance z/ 2 , then 

P{\X-v\>k}< V - 

Call the number of iterations performed n. Suppose that we wish to bound 
by c the probability that our estimate / deviates from the value fj, being 
estimated by more than some fraction d of fj,. Because the variance of / is 

2 

— we have 




< 



< 



nd 2 n 2 

a 2 
nd 2 jj? 



(5.1) 



Now we can estimate the number of iterations we require by considering c, 
the complement of our desired confidence level: 



nd 2 n 2 

a 2 
cd 2 jj? 



(5.2) 



In practice we use the estimate s 2 for a 2 and the estimate / for fj, in cal- 
culating a projected number of iterations. We repeat the calculation after 
each iteration of the algorithm and check to see whether we have performed 
enough iterations to bound the error as desired. 

The Chebyshev Inequality provides a conservative bound on the num- 
ber of iterations required. For large numbers n of simulations, we expect 
from the Central Limit Theorem that the distribution of / is approximately 
normal. Thus, for example, we may have 95% confidence that / is in the 
interval \fi — ^,/i+ -y= • We can use the Central Limit Theorem in the 
same way that we did the Chebyshev Inequality, to calculate a projected 
number of iterations required to bound the error as desired. 

By the Central Limit Theorem we have that 



fl+f2 + -.. + fn 



n/i 



< a 



awn 



$U) 



as n 



oo 
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so that 



P /-/*< 



aa 



J I 1 aa 



$(a) 

$(a)-$(-a) 

2<J>(a) — 1 as n —^ oo 



(5.3) 



Substituting d for -^= and taking the complement, we have then 



A»V« 



/-M 



/x 



> d) ^ 2 1- $ 




as n — ► oo 



(5.4) 



where as before we use the estimate s for <7 and the estimate / for fj, in 
practice. If we wish to bound by c the probability that / varies from the 
desired result by more than d, we may use our formula by calculating after 



each iteration of the simulation the quantity 2 ( 1 — $ 
when it is less than c. 4 



tid\/n 



and halting 



5.5 An Example of Direct Simulation 

A program has been written to estimate the probability that a set of channels 
in a network will have a particular load, using the simulation algorithm of 
Section 5.2. Although simulation will let us estimate blocking probabilities 
for larger networks, and we will use a larger network later in the chapter, here 
we use the network of Figure 4.1, reproduced here in Figure 5.1. We do so 
because we know an exact result for this network (as shown in Section 4.5), 
and thus we can verify that in this example the simulation algorithm achieves 
the error bounds it should. 

We will estimate the probability that both of the channels leading to 
sink 7 in this network carry no messages. We had determined in Section 4.5 
that this probability (under uniform addressing, with each source having a 
probability of 0.5 of generating a message at each cycle) was 



P{LtT6-O7-0 — 0,£xX7-O7-0 — 0} 



10321939817 
17179869184 



0.6008 



Of course, we could also use the inverse function $ to allow us to project a number 
of iterations; but we have not done this here. 
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l2 
l 3 

l 4 
l 5 

l 6 
l7 




Figure 5.1: The 8x8 deterministically-interwired network of Figure 4.1. In 
the example we estimate the probability that the channels leading to sink 7 
carry no messages, under uniform addressing. 



We see in Figure 5.2 the result of running the program to estimate the re- 
quired probability, using Equation (5.4) to calculate the number of iterations 
necessary to achieve an estimate that lies within 1% of the actual value with 
95% confidence. 

We see that approximately 25, 000 iterations are required to estimate the 

value 

15222 



25211 



0.604 



which is indeed within 1% of the exact solution. Using the more conservative 
bound of Equation (5.2), the simulation runs for about 133,000 iterations, 
yielding a result of 79825/132847 « 0.6009. 

5.5.1 The Expense of Direct Simulation 

For a network with N stages with M channels between each stage, an itera- 
tion of the direct simulation algorithm of Section 5.2 runs in time O(NM). 
We can use Equation (5.2) to bound the total cost of estimating fj, with a 
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> (setq o7-channels (elements-named ' (tt6-o7-0 tt7-o7-0) d8x8)) 
(#<CHANNEL TT6-07-0> #<CHANNEL TT7-07-0>) 

> (simulate-multi-channel-loading-probability d8x8 o7-channels '(0 0) 

(make-clt-stopping-f unction .01 .05 5000)) 
Iteration 15; mean: .667; variance: .238; current confidence .042 



Iteration 5000; 


mean: 


.607; 


variance : 


.239; 


current confidence 


.62 


Iteration 10000 


mean: 


.603 


variance : 


.239 


current 


confidence 


.782 


Iteration 15000 


mean: 


.605 


variance : 


.239 


current 


confidence 


.871 


Iteration 20000 


mean: 


.603 


variance : 


.239 


current 


confidence 


.919 


Iteration 25000 


mean: 


.604 


variance : 


.239 


current 


confidence 


.949 


15222/25211 














76026279/317784655 














25211 

















Figure 5.2: Estimating by direct simulation the probability that both chan- 
nels leading to sink 7 in the network of Figure 5.1 carry no messages, under 
uniform addressing and with a source transmission probability of 1/2. 



deviation factor of d and at a confidence of 1 

r 2 \ / 



c as 



a 



°r^ 



V cd l \± 



because in the Bernoulli trials that make up the iterations of a direct simu- 
lation we have a 2 = ji (1 — ji). 



5.6 Approximating a Solution to the Exact Equa- 
tions 

5.6.1 Approximating Equation (4.2) Across a Single Stage 

We saw in Chapter 4 that our method of exact calculation of blocking 
probabilities suffered from exponential increase in the expense of calculation 
as the number of dependent paths between stages increased. Equation (4.2) 
specified the probability that a set of output channels of switches, depicted 
in Figure 5.3, carried a particular load: 
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B2 — 
B3 — 
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— C1 
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— C3 

— C4 










• 




• 




Bm-3 — 
Bm-2 — 
Bm-1 — 

Bm — 




— Ck-3 

— Ck-2 

— Ck-1 

— Ck 









Figure 5.3: The network stage referred to by Equation (5.5). 
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P{L Cl = l Cl ,---,L Ck = lc k ) = 

^2 p {^Ci = k\,---, L Ck = k' k \L Bl = Ib 1 , • • • , L Bm = l Bm } ■ 

P{L Bl =l Bl ,...,L Bm = l Bm } (5.5) 

where the sum is over all tuples l Bl , . . . , l Bm with elements in {0, 1}. 

A method of approximate solution of this equation that suggests itself 
immediately is one of the following form: 

Rather than calculating this sum over all tuples l Bl , . . . ,l Bm , 
calculate it exactly for only some of the tuples. 

To be more precise, suppose that we define 

g{l Bl ,...,l Bm ) = 

P{L Cl = l Cl ,---,L Ck = lc k | L Bl = lB 1 ,---,L Bm = lB m ) 

and we generate tuples lg 1 , . . . , lg m randomly in accordance with the prob- 
ability mass function P{L Bl = h t , • • • , L Bm = l Bm }. 

Now g (l Bl , . . . , l Bm ) is a random variable, and its expectation is 

E[g(l Bl ,...,l B J] 

= ^2 9(h 1 ,---,h m )' p { L B 1 =h 1 ,---,L Bm = lB m } 

l B 1 ,---,lB m 

= X] ^i L Ci = /ci , • • • , L Ck = h k I L Bl = l Bl , . . . , L Bm = l Bm } ■ 

l B 1 ,---,lB m 

P{L Bl =l Bl ,..., L Bm = l Bm } 
= P{^C! = lc 1 T--,L Ck = lc k ) 

Thus we see that g (l Bl , . . . , l Bm ) is an unbiased estimator of the probability 
we wished to estimate: P{Lc\ = lc\, ■ ■ ■ , Lc k = lc k }- 
We can readily calculate 

P{L Cl =/<?!,..., L Ck = lc k \L Bi =Ib 1 ,---, Ls m = hm } 

by factoring the expression as in Equation (4.3). As we observed in Chap- 
ter 4, although the loads on the channels C\, . . .,C'k are not in general 
independent, the loads on the subset of channels originating at each switch- 
ing element are independent given the message loads on the input channels 
B\ , • • • , B m . 
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01 
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04 



-0n-3 
-0n-2 
-0n-1 
- On 



L I 



Figure 5.4: The solid box shows the stages of the network for which the 
estimator g performs an exact calculation; the dotted box shows the stages 
of the network for which h will perform an exact calculation. 



5.6.2 Approximating Equation (4.2) Across Multiple Stages 

We have then an estimator for the probability that a set of channels at 
some stage in the network carries a particular load. We may estimate the 
value of this probability by generating, in accordance with the appropriate 
probability distribution, input loads for the switches at which the channels 
originate. It occurs now to ask whether we might be able to extend the 
estimation technique to cover more than one stage of the network. 

The situation is as depicted in Figure 5.4. We have an estimator g that 
will allow us to estimate the probability of loads on the channels C\, . . . , C\, 
if we generate the loads on the input channels B\, . . -,B m . We require an 
estimator h that will allow us to estimate the probability of loads on the 
channels 0\, . . ., n , by generating the loads B\, . . . , B m . 

The estimator h (l Bl , ■ ■ ■ , ^B m ) will in fact simply be 

h(l Bl ,...,lBm) = 

P{L 0l = Zoj , • • • , L 0n = h„ I L Bl = l Bl , • • • , L Bm = l Bm ] (5.6) 
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which, by an argument identical to that for g, will be an unbiased estimator 

of P{L 0l =l 0l ,---,L 0n = lo„}- 

To evaluate the conditional probability, we define 

Q{E} = -p{E\L Bl =l Bl ,...,L Bm = l Bm } 

If the input channels to the final-stage switches are called D\, . . -,Dj, 
we now have 

Q{i 0l = /oi , • • • , L 0n = h„} = 

z2 Q\-^Oi = ; Oi ' • • • ' L o n = lo„ I L Dl = l Dl , . . . , L Dj = l Dj | • 

Q\L Dl = l Dl ,...,L Dj = l Dj j (5.7) 

which is similar to Equation (4.2). Note that 

Q\L 0l = lo 1 , • • • , L 0n = lo„ I L Dl = l Dl , . . . , L Dj = l Dj | = 
PJ-^O! = hi,- ■ -,Lo n = lo„ I L Dl = l Dl ,.. -,L Dj = l Dj j 

because, given the loads on the input channels D\, . . .,Dj, the loading 
probabilities on the channels 0\, . . .,0 n are independent of the loads on 
B\, . . ., B m , so long as these are distinct from D\, . . .,Dj. Thus the condi- 
tional probability inside the summation can be factored in the same fashion 
as that in Equation (4.2). 
We can evaluate the term 



QJ-^-Di = ^Dj , • • • , Ldj = lDj J 



using Equation (5.7) recursively, just as we did with Equation (4.2). In fact, 
the only point at which the evaluation of h(ls 1 , ■ ■ -,^B m ) will differ from 
that of a network comes when the channels D\, . . .,Dj correspond to the 
channels B\, . . .,B m . At this point we will be evaluating 

Q{L Bl = l' Bl ,...,L Bm = l' Bm ) = 

P{L Bl = l' Bi ,...,L Bm = l' Bm I L Bl = l Bl ,...,L Bm = l Bm ) 

which will be 1 only when 

l B 1 =l' Bl ,..., l Bm = l' Bm 
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and will be otherwise. 

It is interesting to note that this last expression can thus be factored as 

Q{L Bl = l' Bl ,...,L Bm = l' Bm ) = 

P{L Bl = l' Bl \ L Bl = l Bl } ■ ...■ P{L Bm = l' Bm | L Bm = l Bm ) 

which demonstrates that, given the input loads, the individual channel prob- 
abilities are independent. In particular, we see that in evaluating h (l Bl , . . . , l Bri 
we may treat the channels B\ , . . . , B m as the source channels of a network 
the sources of which have transmission probability when l B = 0, and 
transmission probability 1 when l B = 1. 

That is, we see from Equation (4.4) that a channel leading from a source 
that transmits with probability l B% has a loading probability mass function 




as 



which, because l B% and l' B . can only be or 1, is the same 

v{L Bi = l' Bi \L Bi = l Bl ) 

Therefore we see that we may evaluate the conditional probability that 
is the definition of the estimator h by means of recursive application of 
Equation (4.3) with a network whose sources I Bl , . . . , I Bm are connected to 
channels B\, . . . , B m . Source I B has source transmission probability when 
l B = 0, and source transmission probability 1 when l B = 1. 

Thus a scheme for approximating Equation (4.2) is to pick a stage at 
which to divide the network, and solve the network to the right of it exactly, 
given source transmission probabilities equal to loads that we generate with 
probabilities given by the joint probability mass function of the channels 
where the division was made. This yields a sample value of the unbiased 
estimator h(lo 1 , ■ ■ .,/o„), whose expectation we may evaluate by a Monte 
Carlo method. 

5.6.3 Generating Random Variates from the Joint Probabil- 
ity Mass Function P{L Bl = l Bl , . . .,L Bm = h m } 

It remains to describe a method of generating random tuples l Bl , . . . l Bm in 
accordance with the probability mass function P{L Bl = l Bl , . . . , L Bm = l Bm \. 
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The method is straightforward: we simply simulate the network using the 
method of Section 5.2, and use the channel loads generated by the simula- 
tion. Because we were careful that our simulation would correspond exactly 
to the equations, the random variates generated this way will have the mass 
function P{L Bl =l Bl ,..., L Bm = l Bm }. 

Thus we see that one method of approximate solution of the exact equa- 
tions corresponds to combining simulation and exact calculation. In fact, 
looked at another way, solving for the loading probabilities of the subnetwork 
made up of the later stages is simply a means of reducing the variance of the 
simulation, because, as we shall see, h(lo 1 , ■ ■ -,Io n ) w iU always have lower 
variance than the corresponding Bernoulli variable in direct simulation. 

5.7 Examples of Approximation of the Exact Equa- 
tions 

A program has been written to use the approximation method described in 
the previous section. We will first examine some details of the performance 
of the method by considering some examples in detail. Then we will use the 
techniques we have described to compare the performance of three networks. 

5.7.1 Performance of the approximation method on some 
simple examples 

For a first example, let us consider the familiar network depicted in Fig- 
ure (5.5). Here the estimator h is used for only the final stage of the network. 
The resulting run is shown in Figure 5.6. 

We see that about 11,000 iterations are required to estimate a loading 

probability of 

100811 

« 0.5997 

168112 

as compared to about 25,000 iterations for the same error bound by di- 
rect simulation. The reason for the difference is directly evident when we 
compare the variance of / (R) to that of h Qo 1 , • • • , lo„)'- 

Var(/(R)) « 0.239 but Var (h (l 0l , ■ ■ ■ , hj) ~ -098 

so that the variance has been reduced by a factor of about 2.43. 

This is in fact a general result; the variance of h will always be lower 
than that of /, as we shall see in Section 5.9. 
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Figure 5.5: The 8x8 deterministically-interwired network of Figure 4.1. 
The left box contains the two stages simulated in the first example of Sec- 
tion 5.7.1; the right box contains the stage solved for exactly. 



> (estimate-loading-probability d8x8-left-2 d8x8-right-l 
'((tt6-o7-0 0) (tt7-o7-0 0)) 
(make-clt-stopping-f unction .01 .05 5000) 
'(g-tt6-sink h-tt6-sink g-tt7-sink h-tt7-sink)) 

Iteration 15; mean: 0.7; variance: .151; current confidence .056 

Iteration 5000; mean: .597; variance: 0.1; current confidence .819 

Iteration 10000; mean: .601; variance: .098; current confidence .944 

100811/168112 

231602841/2354912896 

10507 



Figure 5.6: Estimation by approximation method of the probability that 
both channels leading to sink 7 in the network of Figure 5.5 carry no mes- 
sages, under uniform addressing and with a source transmission probability 
of 1/2. Compare to Figure 5.2. 
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Figure 5.7: The 16 X 16 randomly-interwired network of Figure 4.8. The 
network is from [2]. 



It will be clear from Equations (5.2) and (5.3) that, all other factors 
remaining equal, the number of iterations required to achieve an error bound 
is a linear function of the variance of the random variable whose expectation 
is being estimated. 

Now we try direct simulation and our approximation method on the 
four-stage 16 X 16 network of Figure 4.8, reproduced in Figure 5.7. 

We see in Figure 5.8 the results of using direct simulation to estimate 
the probability that the top channel leading to sink in the network of 
Figure 5.7 carries no messages. In Figure 5.9 we see the results of using 
the approximation method where exact calculation is used for only the final 
stage of the network. Finally, in Figure 5.10 we see the results of using 
the approximation method where exact calculation is used for the final two 
stages of the network. In all three cases, uniform addressing was used, with 
sources having transmission probabilities of 1/2. 

Where direct simulation was used, the variance was ~ 0.171; where exact 
calculation was used for only the final stage, the variance was ~ 0.072; where 
exact calculation was used for the final two stages, the variance was ~ 0.018. 
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> (simulate-multi-channel-loading-probability rndl6xl6 

(elements-named ' (ttO-oO-0) rndl6xl6) '(0) 

(make-clt-stopping-f unction .01 .05 5000)) 
Iteration 15; mean: 0.8; variance: .171; current confidence 0.06 
Iteration 5000; mean: .778; variance: .173; current confidence .815 
Iteration 10000; mean: 0.78; variance: .172; current confidence 0.94 
8392/10737 
2459905/14409054 
10737 



Figure 5.8: Estimating the probability that the first channel leading to sink 
in the network of Figure 5.7 carries no messages, by direct simulation. 
10,737 iterations were required to achieve the error bound of ±1% with 
95% confidence. Here uniform addressing was used, with sources having 
transmission probabilities of 1/2. 



> (estimate-loading-probability rndl6xl6-lef t-3 rndl6xl6-right-l 

'((ttO-oO-0 0)) 

(make-clt-stopping-f unction .01 .05 5000) 

' (s-ttO-sink t-tt0-sink)) 
Iteration 15; mean: 0.75; variance: 0.08; current confidence .082 
7101/9076 
2976821/41177812 
4538 



Figure 5.9: Estimating the probability that the first channel leading to sink 
in the network of Figure 5.7 carries no messages, by approximation where 
exact calculation is used for only the final stage of the network. 4, 538 itera- 
tions were required to achieve the error bound of ±1% with 95% confidence. 
Uniform addressing was used, with sources having transmission probabilities 
of 1/2. 
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> (estimate-loading-probability rndl6xl6-lef t-2 rndl6xl6-right-2 

'((ttO-oO-0 0)) 

(make-clt-stopping-f unction .01 .05 5000) 

'(j-s-sink k-s-sink 1-s-sink m-s-sink 
j-t-sink k-t-sink 1-t-sink m-t-sink)) 
Iteration 15; mean: .762; variance: 0.02; current confidence .164 
16577/21376 
935754207/51132760064 
1169 



Figure 5.10: Estimating the probability that the first channel leading to sink 
in the network of Figure 5.7 carries no messages, by approximation where 
exact calculation is used for the final two stages of the network. 1, 169 itera- 
tions were required to achieve the error bound of ±1% with 95% confidence. 
Uniform addressing was used, with sources having transmission probabilities 
of 1/2. 

We see then that by using exact calculation for two stages of this network, 
we reduce the number of iterations necessary by a factor of about 9. In the 
next section we will see why we can always expect lower variance from h 
than from /. 

5.7.2 A comparison of the performance of three networks 

We present three example networks, all taken from [2]. The first net- 
work, shown in Figure 5.11, is constructed from two non-dilated four-stage 
networks connecting 16 endpoints. Because the degree of path-redundancy 
is small (there are only two paths connecting any two endpoints), automatic 
calculation of the exact probability of successful message transmission is 
feasible. 

The second network, shown in Figure 5.12, is a deterministically-interwired 
multipath network constructed from 4x2, dilation 2 crossbars, and 2x2 
crossbars. As can be seen in the figure, multiple paths connect any two 
endpoints, and calculation of the exact probability of successful message 
transmission is not quickly feasible on current uniprocessor workstations. 

The third network is the randomly-interwired multipath network of Fig- 
ure 5.7. Recall that, as with the deterministically-interwired network, mul- 
tiple paths connect any two endpoints, and, again, exact calculation of per- 
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Figure 5.11: A 16 X 16 network constructed from two non-dilated networks 
each connecting 16 endpoints. Redundant paths between an input and an 
output are shown. The figure is from [2]. 
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Figure 5.12: A 16 X 16 network with deterministic interwiring in the first 
and second stages. Redundant paths between an input and an output are 
shown. The figure is from [2]. 
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P { Success ' 




Pi 



Figure 5.13: The probability of successful message transmission is shown 
for each of the three networks in Figures 5.11, 5.12, and 5.7. The results 
for the replicated network are shown in black, and are exact; the results 
for the deterministically-interwired network are shown in grey, and those 
for the randomly-interwired network are shown dashed. See the text for a 
discussion of the results. 



formance parameters is too expensive to be feasible. 

The performance of the three networks can nonetheless be compared ef- 
fectively using the exact method for the first and the approximation method 
for the second and third. In the cases where the approximation method was 
used, we have specified that the solution must lie within ±1% of the actual 
value with 95% confidence. 

We see in Figure 5.13 the probability of successful message transmission 
for each of the three networks, and in Figure 5.14 the bandwidth, or through- 
put, for each of the three networks. As was also found in [2] (although using a 
much more complex model), the deterministically- and randomly-interwired 
networks perform identically to within the resolution of the approximation; 
and the replicated network performs considerably worse than either. 
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Figure 5.14: The bandwidth, or throughput, is shown for each of the three 
networks in Figures 5.11, 5.12, and 5.7. The results for the replicated net- 
work are shown in black, and are exact; the results for the deterministicaily- 
interwired network are shown in grey, and those for the randomly-interwired 
network are shown dashed. See the text for a discussion of the results. 
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5.8 Variance of Estimators in the Approximation 
Method and in Direct Simulation 

A simple and well-known theorem in Monte Carlo methods confirms what we 
have seen in our examples: the estimator h will always have lower variance 
than will the estimator /. In [10], the theorem is paraphrased as, "if, at any 
point of a Monte Carlo calculation, we can replace an estimate by an exact 
value, we shall reduce the sampling error in the final result." This is why 
we can see our method of approximating the exact equations as a means 
of reducing the variance of the simulation. The exact equations are too 
expensive to solve exactly for large networks with many dependent paths, 
but knowledge and use of the exact equations on a subproblem makes it 
possible for us to realize in simulation the reduced sampling error promised 
by the theorem. 

The argument in [10] is short enough that we reproduce it here, adapted 
to our particular estimators. 

We note that / (R) and h (/#! , . . . , /_e m ) have the same mean, fj,. Because 
/ is binomial, it has variance fj, (1 — ji). The variance of h is given by 



Var (ft) = E[ft 2 J -E[ftf 
Thus 

Var (/) - Var (ft) = \i - /i 2 - (e [ft 2 ] - ^ 
= ii - E [ft 2 
= E\h-h 2 



Now, ft, being a loading probability, lies in the interval [0, 1], so that ev- 
erywhere ft > ft 2 . But ft takes on with nonzero probability at least some 
values that are not or 1, because ft is not Bernoulli, so that for some tuples 
(l Bl ,...,l Bm ),h-h 2 > 0. ThusE[ft-ft 2 ] > and Var(/) > Var (ft), as we 
desired to show. 

5.9 Expense of the Approximation Method 

One is tempted by the results of Section 5.7.1 to ask what happens if we again 
increase the number of stages for which ft performs an exact calculation. 
Although it seems likely that the variance would be reduced further, the 
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experiment is not likely to be worth our while, as the network for which 
we would be calculating exactly the loading probabilities would now have a 
much larger number of redundant paths leading from its sources to sink 0. 
Thus we would be faced with the same problem of exponential growth as 
before. 

Our method can only reduce the expense of simulation so much, until the 
exponential growth of the running time of each iteration with the number of 
dependent channels dominates the savings in number of iterations. In fact, 
the final stage of a network, considered by itself, will always constitute a 
Banyan network, and so we can can always calculate loading probabilities 
across it at the same asymptotic expense as simulation - there are no redun- 
dant paths, and the reduction of the number of iterations with the variance 
will be realized in reduced running time. 

The final two stages of the network of Figure 5.7 do not constitute a 
Banyan network, but the number of redundant paths between a source and 
a sink is small (two), and so in this case the running time is also significantly 
reduced. In many types of multipath networks larger final subnetworks 
constitute Banyan networks or have small numbers of dependent channels; 
in these networks it will be profitable to use exact calculation for more than 
one final stage. 

In a network with N stages with M channels between each stage, if exact 
calculation is used for the final K stages, then in the worst case, where the 
load on every channel between two stages of switches in the final K stages 
is dependent on the loads on the other channels between those two stages 
of switches, the running time of exact calculation for the final stages will 
be 0(KM2 2 J. There will be N — K stages simulated, at an expense of 
0((N — K) M) steps per simulation, so that the worst-case performance will 

where c is the complement of the desired confidence; d is the deviation factor, 
fj, is the mean and a 2 the variance of h, the result of exact calculation. 

The worst-case result is misleading, however, because in networks built in 
practice, the subnetworks constituted by final stages have smaller numbers of 
dependent channels than does the entire network. In fact, if the final stages 
for which exact calculation is performed constitute a Banyan network, then 
the running time of exact calculation is O(ii'M), and the asymptotic running 
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time of the approximation is simply 



,^ 2 

NM- 






\ cd 2 iJ? J 

where once again c is the confidence complement, d the deviation factor, fj, 
the mean and a 2 the variance of h. 



5.10 Conclusions 

We have developed methods of calculating the value of some performance 
parameters for multistage networks - the normalized throughput and prob- 
ability of successful message transmission - by computing the loading prob- 
abilities of channels leading to sinks. 

We showed initially that independence of loads on channels in a Banyan 
network allows a simple means of calculating channel loading probabilities 
for these networks, and described a way of composing operations on loading 
probability mass functions to derive expressions for the performance param- 
eters. We presented a program that derived such expressions and could be 
used for numerical calculation of performance parameters. 

We then saw that independence of loads on channels will not hold in 
multipath networks, and developed equations for channel loading probabil- 
ities in these networks. We showed that the number of equations that must 
be solved by this method is exponential in the number of dependent paths 
in the network, rendering the method impractical for large networks. We 
presented a program that could be used to calculate channel loading prob- 
abilities exactly for small networks, and discussed its performance in the 
cases of multipath networks and Banyan networks. 

We developed a method of approximate solution of the exact equations, 
and compared its performance to that of direct simulation. We developed 
programs for both our approximation method and direct simulation. We 
saw that use of the exact equations will always afford some improvement in 
performance, by reducing the variance of the estimator in question; and we 
discussed cases where the reduction in running time will be quite substantial. 

5.11 Future Work 

The literature on Monte Carlo methods contains many techniques for re- 
ducing the variance of estimators. Some of these are particularly promising 
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for our application. For example, the use of stratified sampling, where the 
strata are segregated by the number of messages generated by sources in 
a particular cycle, should be easy to implement and promises a significant 
reduction in variance. 

We look forward to comparing more results of the application of these 
methods to the results of more faithful and complex simulations performed 
at M.I.T.'s Transit Group. The aim of the Transit Group's simulations is to 
select a network structure for implementation in a large-scale multiprocessor. 
We expect from the results cited in Section 1.4 that our model will be useful 
in comparing candidate networks. 



Appendix A 

Mathematica Procedures for 
Modelling Banyan Networks 



concentrate: : usage = 

"concentrate [x, n] concentrates the LPMF x to n channels." 

concentrate [x_, n_] := 

(* get distribution for through n-1 channels, and add 

as last element the sum of the rest of the channels. *) 
Append [Take [x, n] , Apply [Plus, Drop[x, n]]] 

discreteconvolution: :usage = 

"discreteconvolution[x, y] treats x and y as 0-based 
vectors and returns their discrete convolution." 

discreteconvolution [x_, y_] : = 
Block [{xlgth, ylgth, lgth}, 
xlgth = Length [x] ; 
ylgth = Length [y] ; 
lgth = xlgth + ylgth - 1; 

(* in summation, portions of sequence with indices 
out of range for sequences must be treated as 
0. *) 
Table [Sum [If [k < 1 II k > xlgth || 

(n-k+1) < 1 I I (n-k+1) > ylgth, 

0, 

(* because of the 0->l index 
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translation, we increase the y-index 
to shift the result sequence back 
down to begin at 1. *) 
x[[k]] y[[n-k+l]]], 
{k, xlgth}] , 
{n, lgth}]] 

bundle: : usage = 

"bundle [x, y] forms the LPMF that results from bundling 
two input bundles with LPMFs x and y." 

bundle [x_, y_] := 

discreteconvolution[x, y] 



switch: :usage = 

" switch [x, p] returns the LPMF of an output bundle to 
which x is switched with probability p." 

switch [x_, p_] := 
Block [{lgth}, 

lgth = Length [x] ; 

Table [Sum [x[[i+l]] Binomial [i, n] p~n (l-p)~(i-n), 
{i, n, lgth-1}] , 
{n, 0, lgth-1}]] 
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